A Unified Metric for Categorical and Numerical Attributes in Data Clustering
نویسندگان
چکیده
Most of the existing clustering approaches concentrate on purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. This paper therefore presents a unified metric for data clustering, in which the attributes are in either one of the three types: numerical, categorical, and their both. We firstly present a general clustering framework based on the concept of object-cluster similarity. Then, a unified metric of object-cluster similarity is presented. Finally, an iterative clustering algorithm is developed, which is directly applicable to the three data types stated above without any adjustment. Experimental results show the efficacy of the proposed approach. A Unified Metric for Categorical and Numerical Attributes in Data Clustering Yiu-ming Cheung Department of Computer Science Hong Kong Baptist University Hong Kong SAR, China [email protected] Hong Jia Department of Computer Science Hong Kong Baptist University Hong Kong SAR, China [email protected]
منابع مشابه
Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number
Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. This paper therefore presents a general clustering ...
متن کاملNumerical and Categorical Attributes Data Clustering Using K- Modes and Fuzzy K-Modes
Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. This paper therefore presents a general clustering ...
متن کاملارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کاملIntegrative Parameter-Free Clustering of Data with Mixed Type Attributes
Integrative mining of heterogeneous data is one of the major challenges for data mining in the next decade. We address the problem of integrative clustering of data with mixed type attributes. Most existing solutions suffer from one or both of the following drawbacks: Either they require input parameters which are difficult to estimate, or/and they do not adequately support mixed type attribute...
متن کاملA framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes
Real-world data collections are often heterogeneous (represented by a set of mixed attributes data types: numerical, categorical and fuzzy); since most available similarity measures can only be applied to one type of data, it becomes essential to construct an appropriate similarity measure for comparing such complex data. In this paper, a framework of new and unified similarity measures is prop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013